<!DOCTYPE html>

<html>
<head>
<meta charset="UTF-8">
<link href="style.css" type="text/css" rel="stylesheet">
<title>VRNDSCALEPS—Round Packed Float32 Values To Include A Given Number Of Fraction Bits </title></head>
<body>
<h1>VRNDSCALEPS—Round Packed Float32 Values To Include A Given Number Of Fraction Bits</h1>
<table>
<tr>
<th>Opcode/Instruction</th>
<th>Op /En</th>
<th>64/32 bit Mode Support</th>
<th>CPUID Feature Flag</th>
<th>Description</th></tr>
<tr>
<td>
<p>EVEX.128.66.0F3A.W0 08 /r ib</p>
<p>VRNDSCALEPS xmm1 {k1}{z}, xmm2/m128/m32bcst, imm8</p></td>
<td>FV</td>
<td>V/V</td>
<td>AVX512VL AVX512F</td>
<td>Rounds packed single-precision floating point values in xmm2/m128/m32bcst to a number of fraction bits specified by the imm8 field. Stores the result in xmm1 register. Under writemask.</td></tr>
<tr>
<td>
<p>EVEX.256.66.0F3A.W0 08 /r ib</p>
<p>VRNDSCALEPS ymm1 {k1}{z}, ymm2/m256/m32bcst, imm8</p></td>
<td>FV</td>
<td>V/V</td>
<td>AVX512VL AVX512F</td>
<td>Rounds packed single-precision floating point values in ymm2/m256/m32bcst to a number of fraction bits specified by the imm8 field. Stores the result in ymm1 register. Under writemask.</td></tr>
<tr>
<td>
<p>EVEX.512.66.0F3A.W0 08 /r ib</p>
<p>VRNDSCALEPS zmm1 {k1}{z}, zmm2/m512/m32bcst{sae}, imm8</p></td>
<td>FV</td>
<td>V/V</td>
<td>AVX512F</td>
<td>Rounds packed single-precision floating-point values in zmm2/m512/m32bcst to a number of fraction bits specified by the imm8 field. Stores the result in zmm1 register using writemask.</td></tr></table>
<h3>Instruction Operand Encoding</h3>
<table>
<tr>
<td>Op/En</td>
<td>Operand 1</td>
<td>Operand 2</td>
<td>Operand 3</td>
<td>Operand 4</td></tr>
<tr>
<td>FV</td>
<td>ModRM:reg (w)</td>
<td>ModRM:r/m (r)</td>
<td>Imm8</td>
<td>NA</td></tr></table>
<p><strong>Description</strong></p>
<p>Round the single-precision floating-point values in the source operand by the rounding mode specified in the imme-diate operand (see Figure 5-29) and places the result in the destination operand.</p>
<p>The destination operand (the first operand) is a ZMM register conditionally updated according to the writemask. The source operand (the second operand) can be a ZMM register, a 512-bit memory location, or a 512-bit vector broadcasted from a 32-bit memory location.</p>
<p>The rounding process rounds the input to an integral value, plus number bits of fraction that are specified by imm8[7:4] (to be included in the result) and returns the result as a single-precision floating-point value.</p>
<p>It should be noticed that no overflow is induced while executing this instruction (although the source is scaled by the imm8[7:4] value).</p>
<p>The immediate operand also specifies control fields for the rounding operation, three bit fields are defined and shown in the “Immediate Control Description” figure below. Bit 3 of the immediate byte controls the processor behavior for a precision exception, bit 2 selects the source of rounding mode control. Bits 1:0 specify a non-sticky rounding-mode value (Immediate control table below lists the encoded values for rounding-mode field).</p>
<p>The Precision Floating-Point Exception is signaled according to the immediate operand. If any source operand is an SNaN then it will be converted to a QNaN. If DAZ is set to ‘1 then denormals will be converted to zero before rounding.</p>
<p>The sign of the result of this instruction is preserved, including the sign of zero.</p>
<p>The formula of the operation on each data element for VRNDSCALEPS is</p>
<p>ROUND(x) = 2<sup>-M</sup>*Round_to_INT(x*2<sup>M</sup>, round_ctrl),</p>
<p>round_ctrl = imm[3:0];</p>
<p>M=imm[7:4];</p>
<p>The operation of x*2<sup>M</sup> is computed as if the exponent range is unlimited (i.e. no overflow ever occurs).</p>
<p>VRNDSCALEPS is a more general form of the VEX-encoded VROUNDPS instruction. In VROUNDPS, the formula of the operation on each element is</p>
<p>ROUND(x) = Round_to_INT(x, round_ctrl),</p>
<p>round_ctrl = imm[3:0];</p>
<p>Note: EVEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.</p>
<p>Handling of special case of input values are listed in Table 5-22.</p>
<p><strong>Operation</strong></p>
<p>RoundToIntegerSP(SRC[31:0], imm8[7:0]) {</p>
<p>if (imm8[2] = 1)</p>
<p>rounding_direction (cid:197) MXCSR:RC</p>
<p>; get round control from MXCSR</p>
<p>else</p>
<p>rounding_direction (cid:197) imm8[1:0]</p>
<p>; get round control from imm8[1:0]</p>
<p>FI</p>
<p>M (cid:197) imm8[7:4]</p>
<p>; get the scaling factor</p>
<p>case (rounding_direction)</p>
<p>00: TMP[31:0] (cid:197) round_to_nearest_even_integer(2<sup>M</sup>*SRC[31:0])</p>
<p>01: TMP[31:0] (cid:197) round_to_equal_or_smaller_integer(2<sup>M</sup>*SRC[31:0])</p>
<p>10: TMP[31:0] (cid:197) round_to_equal_or_larger_integer(2<sup>M</sup>*SRC[31:0])</p>
<p>11: TMP[31:0] (cid:197) round_to_nearest_smallest_magnitude_integer(2<sup>M</sup>*SRC[31:0])</p>
<p>ESAC;</p>
<p>Dest[31:0] (cid:197) 2<sup>-M</sup>* TMP[31:0]</p>
<p>; scale down back to 2<sup>-M</sup></p>
<p>if (imm8[3] = 0) Then</p>
<p>; check SPE</p>
<p>if (SRC[31:0] != Dest[31:0]) Then</p>
<p>; check precision lost</p>
<p>set_precision()</p>
<p>; set #PE</p>
<p>FI;</p>
<p>FI;</p>
<p>return(Dest[31:0])</p>
<p>}</p>
<p>VRNDSCALEPS (EVEX encoded versions)</p>
<p>(KL, VL) = (4, 128), (8, 256), (16, 512)</p>
<p>IF *src is a memory operand*</p>
<p>THEN TMP_SRC (cid:197) BROADCAST32(SRC, VL, k1)</p>
<p>ELSE TMP_SRC (cid:197) SRC</p>
<p>FI;</p>
<p>FOR j (cid:197) 0 TO KL-1</p>
<p>i (cid:197) j * 32</p>
<p>IF k1[j] OR *no writemask*</p>
<p>THEN DEST[i+31:i] (cid:197) RoundToIntegerSP(TMP_SRC[i+31:i]), imm8[7:0])</p>
<p>ELSE</p>
<p>IF *merging-masking*</p>
<p>; merging-masking</p>
<p>THEN *DEST[i+31:i] remains unchanged*</p>
<p>ELSE</p>
<p>; zeroing-masking</p>
<p>DEST[i+31:i] (cid:197) 0</p>
<p>FI;</p>
<p>FI;</p>
<p>ENDFOR;</p>
<p>DEST[MAX_VL-1:VL] (cid:197) 0</p>
<p><strong>Intel C/C++ Compiler Intrinsic Equivalent</strong></p>
<p>VRNDSCALEPS __m512 _mm512_roundscale_ps( __m512 a, int imm);</p>
<p>VRNDSCALEPS __m512 _mm512_roundscale_round_ps( __m512 a, int imm, int sae);</p>
<p>VRNDSCALEPS __m512 _mm512_mask_roundscale_ps(__m512 s, __mmask16 k, __m512 a, int imm);</p>
<p>VRNDSCALEPS __m512 _mm512_mask_roundscale_round_ps(__m512 s, __mmask16 k, __m512 a, int imm, int sae);</p>
<p>VRNDSCALEPS __m512 _mm512_maskz_roundscale_ps( __mmask16 k, __m512 a, int imm);</p>
<p>VRNDSCALEPS __m512 _mm512_maskz_roundscale_round_ps( __mmask16 k, __m512 a, int imm, int sae);</p>
<p>VRNDSCALEPS __m256 _mm256_roundscale_ps( __m256 a, int imm);</p>
<p>VRNDSCALEPS __m256 _mm256_mask_roundscale_ps(__m256 s, __mmask8 k, __m256 a, int imm);</p>
<p>VRNDSCALEPS __m256 _mm256_maskz_roundscale_ps( __mmask8 k, __m256 a, int imm);</p>
<p>VRNDSCALEPS __m128 _mm_roundscale_ps( __m256 a, int imm);</p>
<p>VRNDSCALEPS __m128 _mm_mask_roundscale_ps(__m128 s, __mmask8 k, __m128 a, int imm);</p>
<p>VRNDSCALEPS __m128 _mm_maskz_roundscale_ps( __mmask8 k, __m128 a, int imm);</p>
<p><strong>SIMD Floating-Point Exceptions</strong></p>
<p>Invalid, Precision</p>
<p>If SPE is enabled, precision exception is not reported (regardless of MXCSR exception mask).</p>
<p><strong>Other Exceptions</strong></p>
<p>See Exceptions Type E2.</p></body></html>